A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
نویسندگان
چکیده
We consider the problem of model selection and estimation in sparse high dimensional linear regression models with strongly correlated variables. First, we study the theoretical properties of the dual Lasso solution, and we show that joint consideration of the Lasso primal and its dual solutions are useful for selecting correlated active variables. Second, we argue that correlation among active predictors is not problematic, and we derive a new weaker condition on the design matrix, called Pseudo Irrepresentable Condition (PIC). Third, we present a new variable selection procedure, Dual Lasso Selector, and we show that PIC is a necessary and sufficient condition for consistent variable selection for the proposed method. Finally, by combining the dual Lasso selector further with the Ridge estimation even better prediction performance is achieved. We call the combination, DLSelect+Ridge. We illustrate the DLSelect+Ridge method and compare it with popular existing methods in terms of variable selection and prediction accuracy by considering a real dataset.
منابع مشابه
Robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data
Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...
متن کاملPredicting the Potential Habitat Distribution of Crataegus Pontica C. Koch, Using a Combined Modeling Approach in Lorestan Province
Habitat degradation is one the important reasons of plant species extinction. Modeling techniques are widely used for identifying the potential habitats of different plant species. Thus, the purpose of current study was to determine potential habitats of Zalzalak in Lorestan Province. Species presence data and 23 environmental variables were collected in Lorestan Province. Correlation analysis ...
متن کاملComparison of Ordinal Response Modeling Methods like Decision Trees, Ordinal Forest and L1 Penalized Continuation Ratio Regression in High Dimensional Data
Background: Response variables in most medical and health-related research have an ordinal nature. Conventional modeling methods assume predictor variables to be independent, and consider a large number of samples (n) compared to the number of covariates (p). Therefore, it is not possible to use conventional models for high dimensional genetic data in which p > n. The present study compared th...
متن کاملFeature Selection for Small Sample Sets with High Dimensional Data Using Heuristic Hybrid Approach
Feature selection can significantly be decisive when analyzing high dimensional data, especially with a small number of samples. Feature extraction methods do not have decent performance in these conditions. With small sample sets and high dimensional data, exploring a large search space and learning from insufficient samples becomes extremely hard. As a result, neural networks and clustering a...
متن کاملکارایی روش های مختلف آنالیز آماری در تخمین مؤلفه های آب نمود واحد مصنوعی آبخیزهای شمال کشور
The present research aimed to compare different methods of statistical analysis and to select the best method for achievement to the model among components of synthetic unit hydrograph by using of the physical characteristics of catchments, in northern catchments of Iran, with the area of 177000 km2 in Giulan, Mazandaran and Golestan Provinces. For execution of the research, 9 physical charac...
متن کامل